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Abstract: The exponential growth of online material requires the implementation of 
effective and precise recommendation systems in order to optimize the user 
experience. Nevertheless, conventional approaches frequently encounter problems 
such as cluster overlap, which reduces the accuracy of suggestions. This paper presents 
a new method for minimising the overlap between clusters in movie recommendation 
systems. It achieves this by combining Improved Kohonen Self-Organizing Maps 
(IKSOM) with Silhouette Clustering. The proposed method utilises IKSOM to 
efficiently represent high-dimensional user-item interactions in a two-dimensional 
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IKSOM (Improved space, enabling the formation of distinct and meaningful clusters. Subsequently, 
Kohonen Self-Organizing Silhouette Clustering is utilised to enhance the separation and cohesion of clusters, 
Maps), Movie hence reducing overlap. The experimental findings show that proposed hybrid model 
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works much better than the baseline techniques, obtaining an RMSE of 0.423 and MAE 
of 0.216. Additionally, it improves precision (93.6%), recall (94.2%) and Fl-score 
(93.4%). Additionally, the proposed technique demonstrates a high level of accuracy 
(97.3%) with a precision rate of 95.8%. These results emphasise the method's efficacy 
in minimising errors and enhancing the overall performance of the recommendation 
system. The results indicate that combining IKSOM with Silhouette Clustering can 
improve the precision and dependability of movie recommendation systems by 
resolving cluster overlap and offering more individualised user experiences. 
Subsequent research will investigate the implementation of this method in different 
fields and the integration of supplementary contextual information to enhance the 
accuracy of recommendations. 
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Introduction 

With the surge of online content, developing efficient 
and accurate recommendation systems is crucial to 
enhance user experience. Movie recommendation systems 
particularly face challenges like data sparsity, scalability, 
and cold start issues. Traditional methods often suffer from 
cluster overlap, reducing recommendation precision and 
effectiveness (Awan et al., 2021; Alatrash et al., 2023). 
Advancements in machine learning and clustering 
algorithms offer solutions to these challenges. Kohonen 
Self-Organizing Maps (SOM) are popular for clustering 


high-dimensional data due to their ability to preserve 
topological properties (Guo et al., 2019). However, 
traditional SOM approaches can still experience cluster 
overlap, resulting in less distinct clusters. 

This study introduces an Improved Kohonen Self- 
Organizing Map (IKSOM) combined with Silhouette 
Clustering to reduce cluster overlap in 
recommendation systems. IKSOM effectively maps high- 


movie 


dimensional user-item interactions onto a two-dimensional 
Silhouette 
Clustering further optimizes cluster separation, reducing 


space, enhancing cluster distinctness. 
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overlap and improving recommendation accuracy 
(Algahtani, 2023; Alatrash and Priyadarshini, 2023). The 
hybrid approach leverages IKSOM and_ Silhouette 
Clustering to deliver more personalized and precise movie 
recommendations. Experimental results show significant 
improvement over baseline models, achieving an RMSE 
of 0.205, MAE of 0.042, precision of 93.6%, recall of 
94.2%, and an Fl-score of 93.4%. These results 
underscore the potential of advanced clustering techniques 
to enhance recommendation systems and _ address 
traditional limitations (Awasthi and Goel, 2024). AI 
applications in recommender systems are gaining traction 
for handling large datasets and providing accurate 
predictions reviewed AI in carbon accounting and firm 
performance (Algahtani, 2023), highlighting its role in 
improving accuracy. Similarly, surveyed recommender 
systems’ objectives and evaluation methodologies provide 
insights into the field (Alhiawi et al., 2022). Sentiment 
analysis and deep learning have also been used to improve 

(Alatrash et al., 2021) 
analysis for e-learning 


recommendation accuracy. 
Employed sentiment 
recommendations, showcasing deep learning's ability to 
understand user preferences. This study builds on such 
advancements to propose a novel approach for reducing 
cluster overlap in movie recommendations. 

By focusing on reducing cluster overlap and enhancing 
recommendation accuracy, this study contributes to 
ongoing efforts to develop more _ effective 
recommendation systems. Future research will explore 
applying this approach to other domains and integrating 


additional contextual data to further improve 
recommendation quality. 
The contributions of this study on movie 


recommendation systems are as follows: 

# Enhanced Singular Value Decomposition (SVD) 
Implementation: An upgraded SVD version was 
applied to reduce the dimensionality of the 
MovieLens_25M dataset. This improvement retains 
crucial information and enhances dataset management 
efficiency, providing a robust foundation for future 
research. 

# Content-Driven KNN Approach: Developed a Content- 
Driven KNN technique that evaluates movies based on 
user ratings, release years, and textual descriptions 
using cosine similarity. This approach enables a more 
nuanced and comprehensive assessment of movie 
similarities, enhancing recommendation precision. 

# IKSOM Algorithm with EISEN Cosine Correlation 
Distance: Introduced the IKSOM technique, effectively 
reducing cluster overlap using the EISEN cosine 
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correlation distance. This method improves the clarity 
and accuracy of clustering algorithms, essential for 
effective recommendation systems. 

# Silhouette Clustering for Enhanced Cluster Analysis: 
Utilized silhouette clustering techniques to identify the 
optimal number of clusters, improving the precision 

This 

optimization leads to more accurate and meaningful 

recommendations. 


and significance of movie categorization. 


# Advanced User-Movie Matrices: Developed advanced 
user-movie matrices using SVD collaborative filtering, 
customizing matrices to prioritize movie 

recommendations based on unique user profiles and 

interactions. 

#Performance Improvements of the Hybrid Technique: 
The experimental findings show that proposed hybrid 
model works much better than the baseline techniques, 
obtaining an RMSE of 0.423 and MAE of 0.216. 
Additionally, it improves precision (93.6%), recall 
(94.2%), and Fl-score (93.4%). Additionally, the 
proposed technique demonstrates a high level of 
accuracy (97.3%) with a precision rate of 95.8%. 

This study not only advances the field of movie 
recommendation systems but also lays the groundwork for 
future research that could integrate additional contextual 
information. These findings have the potential to make 
recommendations more personalized and precise, resulting 
in increased user satisfaction and engagement (Awan et al., 
2021; Alatrash et al., 2023; Guo et al., 2019). 
Mathematical Model of Hybrid Recommendation 
Systems 
Integrated Model Structure 

Hybrid recommendation models can integrate CF and 
CBF in several ways, including weighted, mixed, feature 
combination, and model combination approaches. For a 
succinct mathematical representation, we will consider a 
weighted hybrid model, which linearly combines the 
outputs of both the CF and CBF components. 
Collaborative Filtering Component 

Let R denote the user-item interaction matrix, with CF 
approximating this matrix using matrix factorization: 

Rx Uvt 

where U is the user-factor matrix and V is the roc, 
representing latent factors from user preferences and item 
characteristics, respectively. 

Content-Based Filtering Component 

Each item i is associated with a feature vector x,. The 
user profile p, for user t in CEF is defined based on their 


Pn = > TmiXi 


i=Nn 


preferences: 
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whereWN,, includes items rated by the user and rg; are 
the respective ratings. 
Similarity and Prediction: 

The predicted rating R,,; by the hybrid system for an 
item i by a user u can be calculated as a weighted sum of 
the CF and CBF predictions: 

Re a(u, : v;) +(1-a@) (ara) 

wherea@ is a weighting parameter that balances the 
influence of CF and CBF components. 
Optimization and Learning: 

The parameters of the model, including user and item 
latent factors in CF, user profiles in CBF, and the 
weighting parameter @ , are typically learned by 
minimizing the prediction error across all known user- 
intern ratings: 
min) (Rue — Rus)” + ACL UW? +11 V A +H p I) 

(u,)-D 


Where, © represents the set of all user-item pairs with 
known ratings, and A is a regularization parameter to 
prevent overfitting. 


Literature review 

The development and enhancement of movie 
recommendation systems have been the focus of numerous 
studies. Hwang and Park (2022) explored the utilization of 
actor-based matrix computations, highlighting the impact 
of actor attributes on recommendation accuracy. Similarly, 
Kharita et al. (2022) introduced an_ item-based 
collaborative filtering method for real-time applications, 
showcasing its efficiency. Ko et al. (2022) provided an 
extensive survey on recommendation — systems, 
emphasizing the benefits of hybrid approaches that 
combine collaborative and content-based methods. Konar 
et al. (2022) investigated various learning rate scheduling 
techniques for convolutional neural networks, which are 
crucial for optimizing deep learning models used in 
recommendation systems. The security of data in 
recommendation systems was addressed by Jain and 
Thada (2024), who proposed efficient machine-learning 
techniques for data protection. Kumar and Lehal (2023) 
presented a hybrid approach for complex layout detection 
in newspapers, demonstrating the versatility of deep 
learning applications. Kaur (2023) focused on enhancing 
performance and accuracy in skin disease detection using 
deep learning methodologies that can also improve feature 
extraction in recommendation systems. Kudori (2021) 
proposed a hybrid method for event recommendation on 
mobile devices relevant to 
recommendations (Kudori, 2021). 


personalized movie 
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Li et al. (2022) introduced a sentiment-aware neural 
recommendation model that integrates sentiment analysis 
from reviews to improve accuracy. Mehfooza and Basha 
(2021) developed an automated prescriptive data pre- 
processing algorithm, emphasizing the importance of pre- 
processing in recommendation system performance. 

Mishra et al. (2024) compared various strategies in 
pneumonia detection using deep learning, illustrating the 
applicability of these techniques to enhance clustering and 
classification tasks in movie recommendation systems. 
Otter et al. (2022) surveyed the uses of deep learning for 
natural language processing, highlighting advancements 
that can be applied to recommendation algorithms. 

Ozyurt (2022) discussed efficient feature selection for 
remote sensing image recognition using deep learning 
techniques that can enhance recommendation system 
performance. Ricci et al. (2011) provided a foundational 
overview of recommendation systems, discussing various 
methodologies and applications. 

Sarkar et al. (2022) explored intelligent 
recommendation frameworks for tourism using big data, 
which can be adapted for movie recommendation systems 
to improve personalization. Sharma and Arya (2022) 
discussed UAV-based long-range environment monitoring 
with Industry 5.0 perspectives, suggesting adaptations for 
smart movie recommendation systems. Sharma et al. 
(2022) highlighted deep 
communication systems using big data analytics, relevant 
for advancing recommendation systems. 


learning-based intelligent 


Figure 1 Data size analysis in literature review papers: 
The bar chart depicts the spectrum of data sizes employed 
in a compilation of literature reviews across diverse study 
domains. Each bar represents the estimated size of data, 
measured in megabytes, utilised in distinct studies. This 
the 
requirements and range of data used in various research 
methods and fields of study. 


provides information about computational 


Proposed method 

The proposed System has an enhanced version of 
Singular Value Decomposition (SVD) that effectively 
reduces the dimensionality of the dataset. This 
methodological improvement not only saves important 
information but also greatly enhances the organisation and 
effectiveness of the dataset, creating a strong basis for 
future analytical operations. Proposed System presents a 
new and innovative Content-Driven KNN technique that 
uses cosine similarity to evaluate films based on a 
combination of user ratings, release 
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Figure 1. Comparison of Data Sizes in Literature Review Papers. 


years, and textual descriptions. This approach allows for a 
sophisticated and comprehensive assessment of film 
similarities, which is meticulously calibrated to enhance 
the precision of the recommendations offered. The 
IKSOM algorithm has been implemented, using the 
EISEN cosine correlation distance, to efficiently reduce 
cluster overlap. This unique application improves the 
clarity and accuracy of the clustering process, which is a 
crucial element for the effectiveness of any 
recommendation system. Optimisation using Silhouette 
Techniques: Through 
clustering techniques, 


silhouette 
Proposed algorithm has 


the utilisation of 
the 
identified the optimal number of clusters, hence improving 
the accuracy and significance of movie categorizations. 
This strategic optimisation enables the provision of more 
The 
combination of the previously stated methods has made it 
possible to create complex user-movie matrices. By 


focused and _ significant recommendations. 


employing Singular Value Decomposition (SVD) inside a 
collaborative filtering framework, these matrices are 
carefully customised to prioritise movie recommendations 
according to specific user profiles and their interaction 
history. Proposed System hybrid model works much better 
than the baseline techniques, obtaining an RMSE of 0.423 
and MAE of 0.216. Additionally, it improves precision 
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(93.6%), recall (94.2%), and Fl-score (93.4%). 
Additionally, the proposed technique demonstrates a high 
level of accuracy (97.3%). 

Proposed System Framework 

This Proposed System Framework presents an 
innovative hybrid recommendation system specifically 
designed for suggesting films, utilising the 
MovieLens_25M dataset. The system employs a range of 
sophisticated techniques to greatly improve the accuracy 
and customisation of movie recommendations. The 
proposed system's framework integrates multiple crucial 
breakthroughs and enhancements, each meticulously 
crafted to optimise distinct facets of the recommendation 
process: 

The foundation of our framework is an optimised 
version of Singular Value Decomposition (SVD). This 
technique is crucial for efficiently lowering the 
dimensionality of the MovieLens_25M dataset. Through 
the process of refining Singular Value Decomposition 
(SVD), Proposed Algorithm able to retain crucial 
information included in the dataset while also enhancing 
its ease of handling and analytical usefulness. This 
fundamental improvement offers a strong foundation for 
all subsequent data processing and analysis within the 
system. 
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Our method incorporates a unique component called 
the Content-Driven KNN technique, which uses cosine 
similarity to assess movies. This approach evaluates films 
by using a wide range of criteria, such as user ratings, 
release years, and textual descriptions. By employing a 
multi-faceted method, a more nuanced and extensive 
assessment of film similarities can be achieved, resulting 
in more accurate recommendations that are customised to 
particular user interests. 

Our system incorporates the IKSOM technique, which 
uses the EISEN cosine correlation distance, to enhance the 
clustering process. This algorithm efficiently reduces the 
amount of overlap between clusters, which is vital for 
improving the clarity and accuracy of the clustering 
process. Enhancing clustering immediately enhances the 
effectiveness of the recommendation system by 
guaranteeing that film categorizations are both precise and 
significant. 

Optimization using Silhouette Techniques 

The proposed Algorithm utilise silhouette clustering 
techniques to ascertain the most favourable number of 
clusters. This strategic decision greatly improves the 
accuracy and pertinence of movie categorizations. By 
precisely adjusting the sizes of clusters, it is possible to 
provide more focused and significant recommendations. 
This ensures that users are presented with film ideas that 
are closely aligned with their individual tastes and 
inclinations. 

Creating Advanced User-Movie Matrices 

The incorporation of these advanced techniques 
enables the construction of complex user-movie matrices. 
By employing Singular Value Decomposition (SVD) in a 
collaborative filtering framework, the matrices are 
carefully customised to provide higher importance to 
movie recommendations based on individual user profiles 
and interaction histories. This customised technique 
guarantees that every suggestion is meticulously 
customised to individual preferences, hence optimising 
user contentment and involvement. 


Proposed System Framework suggested system's 
hybrid approach exhibits a substantial enhancement in 
performance measures when compared to existing 
methods. With an RMSE of 0.423, MAE of 0.216, 
precision of 92.09%, recall of 93.12%, and an F1-score of 
92.15%, our system demonstrates the efficiency of the 
integrated approaches and surpasses earlier models. 
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Mathematical Model of Proposed Movie 
Recommendations using IKSOM and _ Silhouette 
Clustering 


This algorithm aims to minimize cluster overlap in 
movie recommendation systems by utilizing Improved K- 
means Self-Organizing Map (IKSOM) and Silhouette 
Clustering techniques. The objective is to enhance the 
accuracy of movie recommendations by ensuring distinct 
and clearly defined clusters. 

Input: 

e Dataset Description: 

e User-Movie Rating Matrix R 

e Movie Feature Matrix F 
Output: 

e Optimized clusters of movies 

e Enhanced movie recommendations with reduced 
cluster overlap 
Procedure: 

1 Initialization: 

e Define the set of users U = {uy, Uz, ...,Un} 

e Define the set of movies M = {m,, TM, ws Mp} 

e Initialize the user-movie rating matrix 

Re RY |x|M 
e Initialize the movie feature matrix 
F € R™!*¢_ where d is the number of features 
Feature Normalization: 

e Normalize the movie feature matrix F : 

F Gj ) ae min 
Paks = F 

3 IKSOM Initialization: 

e Initialize the Improved K-means Self-Organizing 
Map (IKSOM) with parameters such as grid size, initial 
learning rate, and initial neighborhood radius. 

4  IKSOM Training: 

e Train the IKSOM using the normalized movie 
feature matrix Fy yin : 


IKSOM 


Foom t)= Vij 


min 


trained 
= train IKSOM (F 


5 Cluster Assignment: 


e Assign each movie m; € M to the nearest cluster 
centroid in the trained IKSOM: 
Cluster(m;) = arg min 


Il Form (4) — Centroid , 7, Vm; € M 


6 Silhouette Score Calculation: 
e ~— Calculate the silhouette score for each movie m; € 
M to evaluate clustering quality: 

b(m;) — a(m) 
max(a(m), b(m;)) ; 
e Here, a(m,) is the average intra-cluster distance 

for movie m; and b(m;) is the average nearest- 
cluster distance for movie mj. 


norm » Learning rate, neighborhood radius ) 


s(m,) = vm; € M 
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Dataflow Diagram for the Algorithm: Improved K-means Self-Organizing Map (IKSOM) and Silhouette Clustering for Movie Recommendations 
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Figure 2. Data Flow Diagram for IKSOM Method. 


7 Silhouette Clustering Optimization: 
e Optimize the clusters based on silhouette scores: 


> s(m;,), Vk 


mjiCMr, 


1 
Optimize Clusters = arg max —— 
k |My 
€ clusters 
8 Cluster Refinement: 
e __Iteratively refine the clusters to minimize overlap: 
|M| 
Refine Clusters = arg min) 
=I 
Il Form (Mi) — New Centroid (m;) |I? 
9 Recommendation Generation: 


e Generate movie recommendations based on the 
refined clusters: 

e Recommendations (u;) ={m,|m; € Cluster 
(m;,), highest rated by similar use 
Return: 

e Optimized clusters of movies with reduced 
overlap 

e Improved movie recommendations for each user 


The algorithm starts by normalizing the movie feature 
matrix to ensure uniform scaling. The Improved K-means 
Self-Organizing Map (IKSOM) is then initialized and 
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Trained using these normalized features, forming initial 
clusters of movies. Each movie is assigned to the nearest 
cluster centroid, and silhouette scores are calculated to 
evaluate clustering quality. 

Silhouette clustering optimization is performed to 
refine the clusters and maximize silhouette scores, thus 
reducing overlap. This iterative refinement process 
ensures distinct and well-defined clusters. Finally, movie 
recommendations are generated for each user based on 
these optimized clusters, resulting in precise and relevant 
recommendations. 

By leveraging IKSOM and _ silhouette clustering 
techniques, this algorithm effectively reduces cluster 


overlap and enhances the accuracy of movie 
recommendations. 
Flow Chart 


The data flow diagram in Figure 2 for the proposed 
movie recommendation algorithm, which integrates an 
improved K-means Self-Organizing Map (IKSOM) with 
Silhouette Clustering, provides a detailed overview of the 
entire process. It starts with inputting the user-movie rating 
and movie feature matrices, followed by _ their 
normalization to ensure compatibility for analysis. The 
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normalized data is then used to initialize and train the 
IKSOM, which facilitates the accurate clustering of 
movies. These clusters are assessed using silhouette scores 
to gauge their quality, leading to the optimization of 
clusters based on these scores. The final step involves 
refining the clusters to minimize overlap and generating 
movie recommendations from the optimized clusters. This 
diagram clearly illustrates each stage of the algorithm, 
emphasizing the iterative and sequential approach to 
enhancing recommendation accuracy and reducing cluster 
overlap. 


Result 

The proposed algorithm aims to reduce cluster overlap 
in movie recommendation systems by employing 
Improved K-means Self-Organizing Map (IKSOM) and 
Silhouette Clustering techniques, utilizing the 
comprehensive MovieLens 25 M dataset. This dataset 
includes 25 million ratings, 1 million tag applications, 
62,000 movies, 162,000 users, and 15 million relevance 
scores across 1,129 tags. The algorithm follows a 
systematic approach to achieve precise and relevant movie 
recommendations by ensuring distinct and well-defined 
clusters. 

Initially, the algorithm sets up the user set U, the movie 
set M, and the tag set T, along with the user-movie rating 
matrix R and the movie feature matrix F. The rating matrix 
R and the tag application matrix A are populated from the 
dataset, where R(i, j) represents the rating given by user uj; 
to movie mj; , and A(i,j) denotes the number of tag 
applications by user uj for movie m;. Additionally, the tag 
relevance matrix G is populated with the relevance scores 
of tags t, for movies m; 

To ensure uniform scaling, the movie feature matrix F 
undergoes min-max normalization, resulting in the 
The IKSOM is then 


initialized with parameters such as grid size, initial 
learning rate, and initial neighborhood radius, and is 


normalized feature matrix a . 


trained using F This training process produces a 


norm * 
trained IKSOM model, which assigns each movie m, to 
the nearest cluster centroid by minimizing the distance 
between the movie features and the cluster centroids. 
Following the clustering, silhouette scores are 
calculated for each movie m; to evaluate the quality of the 
clustering. The silhouette score s(m;) is determined based 
on the average intra-cluster distance a(mj;) and the 
average nearest-cluster distance b(mj) , providing a 
measure of how well each movie fits within its assigned 
cluster compared to others. Clusters are optimized by 
maximizing the average silhouette 


score, ensuring 
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enhanced cohesion within clusters and separation between 
them. 

The algorithm iteratively refines the clusters to 
minimize overlap by reducing the sum of squared 
distances between the normalized movie features and the 
new centroids. This iterative process guarantees distinct 
and well-defined clusters, effectively reducing overlap. 

Finally, the algorithm generates personalized movie 
recommendations for each user uj. Recommendations are 
derived from the highest-rated movies within each user's 
assigned cluster, ensuring relevance and personalization. 
The final output includes optimized clusters of movies 
with 
recommendations for each user. 

In summary, this algorithm leverages IKSOM and 


reduced overlap and improved movie 


silhouette clustering to enhance the precision of movie 
recommendations by ensuring distinct and well-defined 
clusters. It addresses cluster overlap and provides relevant, 
tailored recommendations, thereby optimizing user 
satisfaction in movie recommendation systems. 


Algorithm 1: Reducing Cluster Overlap in Movie 
Recommendations using IKSOM and _ Silhouette 
Clustering 
Input: 
e Dataset Description: 
25 million ratings 
Output: 
e Optimized clusters of movies 
e Improved movie recommendations with reduced 
cluster overlap 
Procedure: 
1 Initialization: 
© =Let U = {uy, Uy, ..., 462000} be the set of users. 
e =Let M = {m,, Mg, ...,M62000} be the set of 
movies. 
© =6Let T = {t4, to, ...,t4429} be the set of tags. 
e Initialize the user-movie rating matrix R € 
RIYIXIM1, 
e Initialize the movie feature matrix F € R!™ xd 
where d is the number of features. 
2 Data Pre-processing: 
e Populate the rating matrix R from the dataset: 
R(i,j) = rating of user u; for movie m; 
e Populate the tag application matrix A € R!¥Ix|MI 
from the dataset: 
AG) 
= number of tag applications by user u; for movie m; 
e Populate the tag relevance matrix G € RI I*ITI 
from the dataset: 


GG, k) = relevance score of tag t, for movie m; 
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3 Feature Normalization: 
e = Normalize the movie matrix H : 
.,_ FG&)-mink) _.. 
From (J) = max(F) — mange 
4 IKSOM Initialization: 
e Initialize the Improved K-means Self-Organizing 
Map (IKSOM) with parameters: 
IKSOM = {grid size, initial learning rate, initial 
neighborhood radius } 
5 IKSOM Training: 
e —_- Train the IKSOM using the normalized movie 
feature matrix Foon : 
IKSOM pained 


= train IKSOM (F 
6 Cluster Assignment: 


e __ Assign each movie m; € M to the nearest cluster 
centroid in the trained IKSOM: 


Cluster (m;) = arg min 


learning rate, neighborhood radius ) 


norm ? 


Il Form (4) — Centroid , 7, Vm; € M 


7 Silhouette Score Calculation: 

e For each movie m; € M, calculate the silhouette 
score to evaluate the clustering quality: 
b(m;) — a(m) 
max(a(m), b(m;)) ’ 

e Where a(m;) is the average intra-cluster distance 
for movie m; and b(m;) is the average nearest-cluster 
distance for movie m,. 

8 Silhouette Clustering Optimization: 

e Optimize the clusters based on silhouette scores: 


> s(m;,), Vk 


mjiCM, 


s(m,;) = vm; € M 


1 
Optimize Clusters = arg max M1 
€ clusters 
9 Cluster Refinement: 
e _ Refine the clusters iteratively to minimize 
overlap: 
|M| 
Refine Clusters = arg min) 
i=1 
ll F_.. (m;) — New Centroid (m,) |I? 
10 Recommendation Generation: 
e  Foreach user u; € U: 
e Identify the cluster of movies assigned to the user 
based on their rating patterns. 
e Generate recommendations from the highest- 
rated movies within the user's cluster: Recommendations 
(u;) = {m; | m; € Cluster (u;), highest rated by similar 


norm 


Return: 
e Optimized clusters of movies with reduced 
overlap. 
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e Improved movie recommendations for each user. 
The proposed algorithm 1 aims to reduce cluster 


overlap in movie recommendation systems by employing 
Improved K-means Self-Organizing Map (IKSOM) and 
Silhouette Clustering techniques, utilizing the 
comprehensive MovieLens 25 M dataset. This dataset 
includes 25 million ratings, 1 million tag applications, 
62,000 movies, 162,000 users, and 15 million relevance 
scores across 1,129 tags. The algorithm follows a 
systematic approach to achieve precise and relevant movie 
recommendations by ensuring distinct and well-defined 
clusters. 

Initially, the algorithm sets up the user set U, the movie 
set M, and the tag set T, along with the user-movie rating 
matrix R and the movie feature matrix F. The rating 
matrix R and the tag application matrix A are populated 
from the dataset, where R(i, j) represents the rating given 
by the user u; to movie m;, and A(i, j) denotes the number 
of tag applications by user u; for movie m;. Additionally, 
the tag relevance matrix G is populated with the relevance 
scores of tags t; for movies m; 

To ensure uniform scaling, the movie feature matrix F 
undergoes min-max normalization, resulting in the 
normalized feature matrix F,,.,. The IKSOM is then 


initialized with parameters such as grid size, initial 
learning rate, and initial neighborhood radius, and is 
trained using F,,.., . This training process produces a 


trained IKSOM model, which assigns each movie m; to 
the nearest cluster centroid by minimizing the distance 
between the movie features and the cluster centroids. 

Following the clustering, silhouette scores are 
calculated for each movie m; to evaluate the quality of the 
clustering. The silhouette score s(m,) is determined based 
on the average intra-cluster distance a(m;) and the 
average nearest-cluster distance b(m,) , providing a 
measure of how well each movie fits within its assigned 
cluster compared to others. Clusters are optimized by 
maximizing the average silhouette score, ensuring 
enhanced cohesion within clusters and separation between 
them. 

The algorithm iteratively refines the clusters to 
minimize overlap by reducing the sum of squared 
distances between the normalized movie features and the 
new centroids. This iterative process guarantees distinct 
and well-defined clusters, effectively reducing overlap. 

Finally, the algorithm generates personalized movie 


recommendations for each user Uj. 
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Figure 3. Density plot for ratings by genre. 
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In summary, this algorithm leverages IKSOM and 
silhouette clustering to enhance the precision of movie 
recommendations by ensuring distinct and well-defined 
clusters. It addresses cluster overlap and provides relevant, 
tailored recommendations, thereby optimizing user 
satisfaction in movie recommendation systems. 

The results pertaining to user and movie ratings are 
presented in Figures 2 and 3 below, along with 


suggestions. 
Table 1. Genres Dataset (MovieLens_25M dataset). 
S.No Genres 
0 |Adventure|Animation|Children|Comedy|Fantasy 
1 Adventure|Children|Fantasy 
2 Comedy|Romance 
a Comedy|Drama|Romance 
4 Comedy 


Table 1, Genres Dataset and Table 2, Rating Dataset 
offers crucial insights into the MovieLens_25M dataset 
utilized in movie recommendation systems. Table 1| details 
the genres assigned to each movie, providing a thorough 
overview of the various movie categories present in the 
dataset. This genre information is vital for understanding 
movie distribution and for customizing recommendation 
algorithms based on users’ genre preferences. 

Table 2. Rating Dataset (MovieLens_25M dataset). 


S.No Userid Movieid Rating 
0 1 296 5.0 
1 1 306 3.5 
2 1 307 5.0 
3 1 665 5.0 
4 1 899 3.5 


Table 2 focuses on the rating dataset, which includes 
user ratings for different movies. It features user IDs, 
movie IDs, and the associated ratings, which are essential 
for analyzing user preferences and crafting personalized 
movie recommendations. Together, these tables underpin 
the process of refining movie recommendations by 
leveraging both genre data and user feedback. 

Figure 3 illustrates that the Film-Noir and Horror 
categories consistently demonstrate high and low average 
ratings, respectively, with occasional extreme values. 
Following this observation, a density plot depicting ratings 
by genre will be generated. 

The efficacy of the proposed technique was evaluated 
using the performance metrics Fl-score, mean absolute 
error, precision, recall, accuracy, and root mean square 
error. An explanation of the performance measures may be 
found below. 
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Figure 4 shows the performance metrics for RMSE and 
MAE. The proposed hybrid technique mitigates the cold 
start issue by using content-driven KNN's_ cosine 
similarity, resulting in an RMSE of 0.423 and MAE of 
0.216. Ratings can be predicted even when no information 
is known. As a consequence, by resolving the cold start 
and data sparsity concerns, the proposed hybrid approach 
may reduce RMSE and MAE errors. 

Figure 5 provides a comparative analysis of the 
proposed hybrid model's performance relative to baseline 
techniques across several critical metrics. It details the 
Root Mean Square Error (RMSE), Mean Absolute Error 
(MAB), Precision, Recall, Fl-Score, and Accuracy for 
both models. The proposed hybrid model shows notable 
improvements over the baseline, with a reduced RMSE of 
0.423 and MAE of 0.216, while achieving impressive 
scores in Precision (95.8%), Recall (94.2%), and F1-Score 
(93.4%). Additionally, it reaches an exceptional accuracy 
of 97.3%, highlighting the proposed model's superior 
effectiveness in enhancing performance metrics. 

This bar chart Figure 6 illustrates a comparative 
evaluation of accuracy metrics reported in several research 
studies and methodologies. It showcases the accuracy rates 
from different approaches, ranging from 84.2% in 
(Vineela et al., 2022) research to an impressive 97.3% 
achieved by the Proposed IKSOM and _ Silhouette 
Clustering method. The chart effectively visualizes the 
differences in performance among various research 
techniques, providing valuable insights into their relative 
accuracy and effectiveness across different fields (Vineela 
et al., 2022; Yavanamandha et al., 2022;Vora et al., 2024) 
and Proposed IKSOM and Silhouette Clustering. 

The investigation determines that content-based 
filtering algorithms are not ideal due to their inefficiency 
and significant time consumption in real-time applications 
across many datasets. Therefore, it is advisable to employ 
a hybrid methodology that integrates both collaborative 
filtering and content-based filtering strategies. An 
effective approach for predicting ratings involves 
combining nearest neighbour selection with matrix 
factorization. To tackle the cold-start problem, cosine 
similarity is employed to compute user similarity and 
provide film recommendations accordingly. The hybrid 
approach being suggested surpasses earlier methods in 
terms of important metrics such as RMSE and MAE, 
showcasing exceptional accuracy, precision, recall, and 
Fl-score. It offers more precise top-N film suggestions, 
demonstrating its effectiveness compared to conventional 
recommendation algorithms. 
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Figure 4. Performance metrics of RMSE and MAE. 
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Figure 5. Performance Metrics: Proposed Hybrid Model. 
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Conclusion 

Using the MovieLens_25M dataset, this study 
introduced a hybrid film recommendation system that 
combines several computational techniques to improve the 
precision and customisation of recommendations. This 
study developed a robust method for analysing and 
predicting user preferences in film consumption by 
combining Improved Singular Value Decomposition 
(SVD), Content-Driven K-Nearest Neighbours (KNN) 
employing cosine similarity, the IKSOM algorithm with 
EISEN cosine correlation distance, and _ silhouette 
clustering. By using Improved SVD, the dataset's 
dimensionality was successfully decreased, maintaining 
crucial information and improving system performance. 
This fundamental method made it easier to use the 
Content-Driven KNN and IKSOM algorithms later on, 
which evaluated film similarity and minimised cluster 
overlap, respectively, with a high degree of accuracy. By 
maximising the number of clusters, silhouette clustering 
approaches helped to further refine the model and more 
closely customise the recommendations to the preferences 
In addition to addressing the 
shortcomings of each methodology separately, the 
combination of these approaches into a single hybrid 


of each individual user. 


strategy far surpassed previous models. The experimental 
findings show that the proposed hybrid model works 
much better than the baseline techniques, obtaining an 
RMSE of 0.423 and MAE of 0.216. Additionally, it 
improves precision (93.6%), recall (94.2%), and Fl-score 
(93.4%). Additionally, the 
demonstrates a high level of accuracy (97.3%) with a 
precision rate of 95.8%. These results emphasise the 


proposed technique 


method's efficacy in minimising errors and enhancing the 
overall performance of the recommendation system. These 
outcomes demonstrate the value of integrating different 
recommendation methods and the system's capacity to 
provide incredibly precise movie recommendations. In 
terms of future research, the study establishes a strong 
foundation by indicating that the inclusion of further 
contextual factors like user demographics, temporal 
viewing patterns, or even social network analysis could 
improve the recommendations’ personalisation and 
accuracy even more. These developments may result in 
recommendation systems that are even more intelligent 
and dynamically adjust to the preferences and context of 
the user, boosting user satisfaction and engagement on 
digital entertainment platforms. 
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